codegen: handle unaligned K for TN #3679

jfactory07 · 2026-01-07T10:10:20Z

Motivation

Enable new address modes: Add support for B address interleave and K-alignment (KRingShift) in TensileLite codegen.
Correctness in tail paths: Ensure tail global reads behave correctly when KRS is disabled at runtime (sgprKRingShift==0) by falling back to the original load-only behavior.
Performance: Reduce redundant tail offset work by hoisting invariants and interleaving offset-apply with buffer_load in the tail path.

Technical Details

BAddrInterleave (e77420a)
- Adds BAddrInterleave validation/knob and ISA capability wiring.
- Computes runtime G once and reuses it across SRD/address calculations (kept live in SGPRs).
KRingShift align-k (cec7d49)
- Adds KRingShift knob and per-workgroup initialization of sgprKRingShift based on cacheline constraints.
- Applies KRS adjustment to computed global addresses and introduces reference-style tail offset remap macros.
Tail-path refinements (5ef3eb7)
- Moves KRS tail offset patching to just-in-time per-load emission (setup once; apply right before each load).
- Adds runtime branching so sgprKRingShift==0 takes a no-KRS load-only path; otherwise executes the KRS-enabled interleaved path (including shared A/B label flow when applicable).
- Fixes tail LDS “zero-out mask” control flow to be conditional (only skip when aligned) and skips the mask sequence when KRS is enabled (since KRS already forces safe OOB behavior).
- Ensures SGPR cleanup: emits .set ... , UNDEF for KRS/BInterleaveG after last use to avoid accidental remapping.
Macro/rocisa robustness (448c4d6)
- Converts KRS tail offset macros to rocisa.code.Macro API.
- Fixes RegisterContainer::toString() handling for macro arg register ranges to prevent invalid expansions.
- Forces specific literals to print as intended (e.g., 0xffffffff).

Test Plan

Codegen build: Run Tensile library generation for gfx950 (e.g., asm-debug + keep-build-tmp) and confirm codegen completes without asm errors.
Assembly inspection:
- Confirm KRS markers/macros appear as expected and macro expansions are valid.
- Validate tail-path behavior:
  - sgprKRingShift==0 → load-only path (no KRS offset apply)
  - sgprKRingShift!=0 → KRS-enabled interleaved path
Runtime validation:
- Run hipblaslt-bench with cases that toggle KRS enable/disable (shapes where cacheline congruence permits/disables KRS) and compare correctness vs baseline.

Test Result

benchmark test for 2048x3072x1880 TN: 9% uplift

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py

aazz44ss · 2026-01-12T08:44:21Z

projects/hipblaslt/tensilelite/Tensile/Common/ValidParameters.py

+    # K ring-shift (restricted): apply a per-WG shift along the summation (K) dimension so that
+    # the B-side base K address for each workgroup is cacheline-aligned/congruent, while preserving
+    # correctness via full-loop ring wrap. Intended for TN/NN-like B (TLUB == False).
+    "KRingShift": [False, True],


Have you add the default value of these two new parameters?

Yes, their default values are currently set to false.

…interleave cannot be enabled

- require tiles1 = SizeJ / MT1 to be an integer (SizeJ % MT1 == 0) # - require lowbit(tiles1) > 1 so that G=min(lowbit(tiles1), LVCB) is > 1 (enabled) # Note: if lowbit(tiles1) == 1, then G==1 and the kernel disables BAddrInterleave.

projects/hipblaslt/tensilelite/Tensile/Common/ValidParameters.py

projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py

codegen: BAddrInterleave

e77420a

github-actions bot added the project: hipblaslt label Jan 7, 2026

assistant-librarian bot added the organization: ROCm label Jan 7, 2026

jfactory07 added 2 commits January 9, 2026 06:04

codegen: implement align-k

cec7d49

refine

5ef3eb7

b-shi reviewed Jan 9, 2026

View reviewed changes

projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py Outdated Show resolved Hide resolved

refine macro

448c4d6

aazz44ss reviewed Jan 12, 2026

View reviewed changes

jfactory07 added 7 commits January 12, 2026 09:14

codegen: do NOT overwrite the original stride SGPRs in-place.

c049b0c

codeGen : refine default value

9003fda

codegen : refine get

8b4554d

host restriction: If n divided by MT1 is not a power of two, address …

b3a42a7

…interleave cannot be enabled

host restriction: change to :

63d386a

- require tiles1 = SizeJ / MT1 to be an integer (SizeJ % MT1 == 0) # - require lowbit(tiles1) > 1 so that G=min(lowbit(tiles1), LVCB) is > 1 (enabled) # Note: if lowbit(tiles1) == 1, then G==1 and the kernel disables BAddrInterleave.

codegen: remove BInterleaveG guard from kernel's runtime

74c6e7a

host restriction: add AssertKRingShiftAlignedK

e682def

aazz44ss reviewed Jan 14, 2026

View reviewed changes

projects/hipblaslt/tensilelite/Tensile/Common/ValidParameters.py Show resolved Hide resolved

aazz44ss reviewed Jan 14, 2026

View reviewed changes

projects/hipblaslt/tensilelite/Tensile/KernelWriterAssembly.py Outdated Show resolved Hide resolved

jfactory07 added 13 commits January 15, 2026 08:21

add: AssertKRingShiftTailWrapOnly

238fdc7

codegen: shift = (-baseOffsetElems) mod cacheLineElements

10b03bb

refine macro

2f6cc19

codegen: refine tail for krs

e871991

tailStartChunk = ceil(KRingShift / chunkElems)

c1c0aa7

fix error

0e59506

clean code

51f1fe7

clean code

0ecb0db

clean code

60e14e4

refine restriction

db885c5

refine comments

b718bda

enable

67b5d1a

add test

11dfdd0

jfactory07 added 2 commits January 22, 2026 07:13

add test case

b6a3b8f

Merge branch 'develop' into users/jzhou/address-interleave

be6e173

jfactory07 marked this pull request as ready for review January 23, 2026 06:06

jfactory07 requested a review from a team as a code owner January 23, 2026 06:06

jfactory07 changed the title ~~codegen: BAddrInterleave~~ codegen: handle unaligned K for TN Jan 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

codegen: handle unaligned K for TN #3679

codegen: handle unaligned K for TN #3679

jfactory07 commented Jan 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

aazz44ss Jan 12, 2026

Uh oh!

jfactory07 Jan 12, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codegen: handle unaligned K for TN #3679

Are you sure you want to change the base?

codegen: handle unaligned K for TN #3679

Conversation

jfactory07 commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Uh oh!

aazz44ss Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

jfactory07 Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jfactory07 commented Jan 7, 2026 •

edited

Loading